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EXECUTIVE  SUMMARY 


This  report  proposes  acceptability  criteria  for  validating  the  modeling  and 
simulation  of  a  generic  tracking  radar.  The  validation  process  is  limited  to  the 
comparison  of  a  set  of  Monte  Carlo  realizations  of  the  simulated  time  series  of  ju¬ 
diciously  selected  validation  metrics  with  single  discrete-event  observations  made 
by  the  actual  system.  Our  approach  is  based  on  a  statistical  hypothesis  test.  The 
two  hypotheses  are  (1)  the  hypothesis  that  the  simulation  is  consistent  with  actual 
system  performance  the  null  hypothesis.  Ho.  and  (2)  the  hypothesis  that  the  sim¬ 
ulation  is  inconsistent  with  actual  system  performance  the  alternative  hypothesis, 
II j.  The  proposed  procedure  is  cognizant  of  the  so-called  model  maker’s  risk.  a.  and 
the  so-called  model  user’s  risk.  3.  corresponding  to  the  probabilities  of  Type  I  and 
Type  IT  errors,  respectively.  For  each  validation  metric,  we  count  the  number  of  sam¬ 
ples  of  the  obsei'ved  time  series  that  fall  outside  of  bounds  prescribed  by  the  Monte 
Carlo  realizations  of  the  simulated  time  series.  Subsequently,  if  the  number  of  ob¬ 
served  samples  that  are  outside  of  the  simulation  bounds  are  above  a  pre-computed 
rejection  threshold.  7,  computed  based  on  a  p re- specified  model  maker's  risk.  n.  we 
declare  the  simulated  time  series  of  the  particular  validation  metric  under  scrutiny 
as  inconsistent  with  the  observed  time  series.  Any  statistical  dependence  present 
in  the  time  series  of  the  validation  metrics  is  accounted  for  in  the  computation  of 
the  rejection  threshold,  7.  The  number  of  Monte  Carlo  realizations  also  impacts  the 
computation  of  7. 

Results  are  summarized  in  a  so-called  scorecard.  For  each  discrete-event  ob¬ 
servation,  the  scorecard  contains  a  list  of  rejection  indices  for  the  different  vali¬ 
dation  metrics,  with  each  rejection  index — expressed  as  a  number  between  0  and 
100  denoting  the  ratio  of  the  samples  of  the  observed  time  series  of  the  associated 
validation  metric  that  are  outside  of  the  simulation  bounds.  Normalized  rejection 
thresholds  for  the  different  validation  metrics  also  expressed  as  numbers  between 
0  and  100  are  also  included  in  the  scorecard.  The  scorecard  reveals  any  cross- 
correlation  that  exists  among  select  validation  metrics.  Due  to  the  unavailability  of 
the  probability  density  function  of  the  observed  behavior,  which  prevents  us  from 
computing  the  model  user’s  risk.  3 .  we  require  that  a  family  of  normalized  rejec¬ 
tion  thresholds,  corresponding  to  different  values  of  the  model  maker's  risk.  o.  be 
included  in  the  scorecard.  Using  sound  judgement  and  common  sense,  a  validation 
agent  may  apply  the  scorecard  to  accept  or  reject  a  given  modeling  and  simula¬ 
tion  product.  The  scorecard  has  the  added  advantage  of  serving  as  a  diagnostic 
tool  thus  aiding  in  modeling  and  simulation  improvement. 
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1.  INTRODUCTION 


The  best  way  to  test  the  performance  of  a  sensor  is  to  repeatedly  conduct  experiments  using 
that  sensor.  For  example,  in  the  case  of  a  tracking  radar,  we  would  coiled,  measurements  originating 
from  a  known  target  and.  using  the  tracking  filter  implemented  within  the  radar  software,  form 
a  track  based  on  those  measurements.  We  would  then  compute  the  target  state  estimation  error 
with  reference  to  the  targets  true  state  known  a  priori  and  evaluate  the  performance  of  the' 
tracking  filter  using  the  tried  and  true  statistical  methods  discussed  in  classic  textbooks  such  as  [1|. 
While  optimal,  such  experiments  are.  unfortunately,  often  not  cost-effective.  Worse,  they  are  almost 
never  repeatable.  For  example,  we  cannot  expect  the  environment  in  which  the  sensor  operates 
to  remain  constant  varying  weather  conditions  being  a  favorite  anecdote.  Therefore,  it  is  often 
more  economical  to  operate  within  a  simulated  environment,  wherein  experiments  can  he  tightly 
controlled  and  repeated  ad  nauseam.  In  order  for  the  simulation  to  be  trusted  as  a  proxy  for 
the  observed  behavior,  we  need  to  have  a  way  of  evaluating  the  accuracy  of  the  models  enabling 
the  simulation.  The  discipline  of  modeling  and  simulation  (M&S)  verification,  validation,  and 
accreditation  (VWA)  is  as  much  an  art  form  as  science.  A  lucid  and  sobering  account  of  many 
remaining  MVS  VWA  challenges  (‘an  be  found  in  [2]. 

In  this  report,  we  focus  on  validating  the  modeling  and  simulation  of  a  generic  tracking  radar. 
The  proposed  validation  criteria  can  be  extended  to  other  radar  functions  as  well  For  rigor's  sake', 
we  abide  by  the  following  definitions  from  [3]: 

•  Verification:  "The  process  of  determining  that  a  model  implementation  and  its  associated 
data  accurately  represents  the  developer's  conceptual  description  and  specifications." 

•  Validation:  “The  process  of  determining  the  degree  to  which  a  model  and  its  associated 
data  are  an  accurate  representation  of  the  real  world  from  the  perspective  of  the  intended 
uses  of  the  model," 

•  Accreditation:  “The  official  certification  that  a  model,  simulation,  or  federation  of  models 
and  simulations  and  its  [sic]  associated  data  are  acceptable  for  use'  for  a  specific  purpose." 

•  Acceptability  Criteria:  “A  sot  of  standards  that  a  particular  model,  simulation,  or  feder¬ 
ation  must  meet  to  be  accredited  for  a  specific  purpose." 

The  purpose  of  this  report  is  to  address  specifically  the  design  of  acceptability  criteria  appropriate 
for  validating  the  modeling  and  simulation  of  a  generic*  tracking  radar  following  the  simulation  vali¬ 
dation  guidelines  provided  in  [4].  The  techniques  we  propose  would  directly  benefit  the  “validation 
agent."  who.  according  to  [3],  is  “[t]he  person  or  organization  designated  to  perform  validation  of 
a  model,  simulation,  or  federation  of  models  and/or  simulations  and  the  associated  data." 

In  an  effort  to  devise  effective  acceptability  criteria,  we  aim  to  satisfy  three  objective's.  First, 
we  note  the  crucial  point  that  the  modeling  and  simulation  product  must  be  able  to  replicate  the 
sensor  s  behavior  irrespective  of  its  performance .  In  other  words,  if  the  sensor  is  expected  to  perforin 
poorly  under  certain  conditions,  then  we  would  like  the  modeling  and  simulation  of  the  sensor  to 


1 


replicate  the  same  poor  performance  -otherwise,  for  testing  purposes,  we  would  not  he  able  to  rely 
on  the  simulation  as  a  true  surrogate  for  the  sensor.  Thus,  the  model  maker  must  not  confuse 
sensor  performance  with  sensor  performance  replication.  Unfortunately,  in  our  experience,  many  a 
good  model  maker  has  fallen  prey  to  an  inability  to  make  this  important  distinction,  a  phenomenon 
we  refer  to  as  the  model  maker's  fallacy .  Such  a  fallacy  tends  to  occur  more  frequently  when  the 
model  maker  is  also  the  equipment  maker. 

Our  second  objective  is  to  ensure  that  the  acceptability  criteria  for  validating  the  modeling 
and  simulation  of  a  given  sensor  are  “anchored”  to  the  behavior  observed  by  that  sensor  such 
as  observations  of  targets  of  opportunity  in  the  case  of  tracking  radars.  Specifically,  we  aim  at 
validating  the  repeated  behavior  exhibited  by  a  given  modeling  and  simulation  product  with  a 
single  discrete-event,  observation — say.  of  a  single  satellite  pass  in  the  case  of  a  tracking  radar 
through  the  use  of  sound  statistical  techniques.  Unfortunately,  we  almost  never  have  access  to  the 
probability  distribution  functions  of  uncertainties  affecting  the  observed  behavior  of  the  sensor. 
Nevertheless,  we  must  devise  criteria  that  minimize  the  risk  to  the  model  user  or  more  precisely 
the  validation  agent  who  is  responsible  for  passing  or  failing  a  given  modeling  and  simulation 
product. 

Our  third,  and  possibly  most  important,  objective  is  that  the  requisite  validation  metrics 
should  actually  aid  in  improving  the  modeling  and  simulation  of  the  sensor.  In  other  words,  we 
seek  to  devise  a  set  of  acceptability  criteria  that  not  only  would  allow  us  to  pass  or  fail  a  given 
modeling  and  simulation  product,  but  also,  in  case  of  failure,  would  serve  as  a  diagnostic  tool  to 
help  us  identify  the  sources  of  failure.  By  satisfving  this  objective,  the  validation  agent  will  be  able 
to  make  a  more  informed  decision  about  the  overall  performance  of  the  modeling  and  simulation 
product.  When  acceptability  criteria  are  tied  to  “physics,”  it  becomes  easier  to  identify  statistical 
outliers,  and  their  impact  on  the  simulation  validation  process  can  thus  be  minimized. 

Our  treatise  begins  with  an  outline  of  a  statistical  decision  theoretic  method  to  modeling  and 
simulation  validation  in  Section  2.  Here,  we  discuss  risks  and  benefits  from  the  point  of  views  of 
the  model  maker  and  the  model  user.  A  decision  theoretic  approach  to  modeling  and  simulation 
validation  is  by  no  means  original  (see  [5]  for  a  summary  of  approaches).  Of  particular  value  is 
an  extension  of  the  statistical  testing  procedure  for  the  equality  of  the  power  spectral  densities 
of  multiple  short  memory  time  series  devised  by  [6]  to  modeling  and  simulation  validation.  The 
modeling  and  simulation  validation  approach  proposed  in  this  report  is  different  and  unique  in  that 
it  combines  results  from  a  specific  statistical  hypothesis  test  with  physical  constraints  imposed  by 
judiciously  selected  validation  metrics  to  allow  for  ail  informed  and  efficient  decision  making  process 
with  the  added  bonus  of  providing  a  road  map  for  modeling  and  simulation  improvement. 

In  Section  3.  we  elaborate  on  how  to  account  for  any  statistical  dependence  that  is  present  in 
the  time  series  of  the  validation  metrics  relevant  to  the  modeling  and  simulation  of  a  tracking  radar. 
In  this  section,  we  also  answer  the  often-asked  question:  "How  many  Monte  Carlo  realizations  are 
sufficient  to  validate  a  given  modeling  and  simulation  product  using  the  method  proposed  in  this 
report?"  We  list  the  validation  metrics  relevant  to  the  modeling  and  simulation  of  a  tracking  radar 
in  Section  4.  Via  a  controlled  numerical  experiment,  we  examine  the  effectiveness  of  the  proposed 
method  in  Section  5.  A  summary  of  our  results  is  given  in  Section  6. 
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2.  STATISTICAL  HYPOTHESIS  TESTING 


We  (‘an  formulate  the  simulation  validation  process  as  a  statistical  hypothesis  test  [5].  We 
consider  two  hypotheses.  We  define  the  null  hypothesis ,  Ho ,  to  he  the  hypothesis  that  the  simulation 
is  consistent  with  actual  system  performance,  while  we  define  the  alternative  hypothesis .  // 1 .  to 
be  the  hypothesis  that  the  simulation  is  inconsistent  with  ac  tual  system  performance.  Therefore, 
we  would  accept  a  valid  simulation  when  Hq  is  true,  and  we  would  reject  an  invalid  simulation 
when  II  \  is  true.  However,  due  to  the  statistical  nature  of  the  problem,  two  types  of  decision  error 
can  arise.  The  so-called  Type  I  error  would  correspond  to  rejecting  a  valid  simulation,  while  the 
so-called  Type  II  error  would  correspond  to  accepting  an  invalid  simulation.  In  the  modeling  and 
simulation  literature,  the  probability,  a,  of  Type  I  error  is  often  referred  to  as  the  model  maker's 
risk ,  and  the  probability,  Li.  of  Type  II  error  is  often  referred  to  as  the  model  user\s  risk  [5]. 

The  validation  problem  lies  in  ‘‘detecting**  a  simulation  that  is  inconsistent,  with  ac  tual  system 
performance1.  For  an  optimal  solution,  one  could,  in  theory,  invoke  the  Neyman-Pearson  theorem 
to  devise  a  “detector*  that  minimizes  the  model  user’s  risk.  /3.  for  a  given  model  maker's  risk, 
a  [7].  In  other  words,  the  model  maker's  risk.  a.  is  treated  as  a  parameter  of  the  decision  problem: 
it  is  used  to  compute  a  “rejection  threshold,”  7,  for  an  appropriate  “test  statistic-  *  of  a  chosen 
“validation  metric  .*'  If  the'  test  statistic  is  observed  to  exceed  the  rejection  threshold,  then,  for  the 
particular  validation  metric*  under  consideration,  the  simulation  is  deemed  to  be  inconsistent  with 
actual  system  performance.  When  there  are  more  than  a  single  metric:  to  be  considered,  correlations 
among  the  metrics  must  be  taken  into  account.  For  modeling  and  simulation  of  tracking  radars, 
validation  metrics  come  in  the  form  of  time  series  as  opposed  to  single  scalars.  Hence,  temporal 
correlations  present  in  the  time  series  must  also  be  taken  into  account.  A  list  of  the  metrics  proposed 
for  the  validation  of  a  given  tracking  radar  simulation  is  given  in  Table  1.  Section  4. 

The'  computation  of  the  likelihood  ratio  needed  for  the  design  of  a  Neyman-Pearson  detector 
demands  an  a  priori,  knowledge  of  the  probability  distribution  functions  (PDFs)  of  both  the  simula¬ 
tion  and  the  actual  system  results  at  least  to  within  a  normalizing  constant,  Generally,  we  do  not 
have  access  to  an  accurate  representation  of  the  PDF  of  actual  system  results.  Due  to  the  nonlinear 
nature  of  the  models  and  the  presence  of  a  large  number  of  random  contributors,  we  often  have' 
no  choice  but  to  resort  to  Monte  Carlo  sampling  techniques  to  derive  PDFs  numerically.  While' 
simulations  are  repeatable,  experiments  involving  actual  systems  might  not  be.  This  is  certainly 
the  case  for  experiments  involving  tracking  radars.  Hence,  we  have  no  choice  but  to  treat  the  PDF 
of  the  actual  system  results  as  unknown.  Fortunately,  we  can  still  compute  a  rejection  threshold. 
7.  based  on  a  given  model  maker's  risk.  0.  since  the  computation  of  7  depends  only  on  the  PDF  of 
tlu'  simulation  results  [7].  which  can  in  general  be  estimated  from  a  histogram  of  the  Monte  Carlo 
samples.  However,  since  we  do  not  have  access  to  the  PDF  of  the  actual  system  results,  we  cannot 
guarantee  the  model  user's  risk.  /?,  to  be  a  minimum. 

This  report  presents  a  procedure,  inspired  by  the  aforementioned  decision  theoretic  concepts, 
in  which  multiple  Monte  Carlo  realizations  of  time  series  corresponding  to  key  validation  metrics 
obtained  by  running  the  simulation  software  multiple  times  arc  compared  with  results  obtained  by 
the  actual  system  during  a  single  discrete-event  observation.  I11  order  to  apply  this  procedure,  we 
begin  by  counting,  for  each  validation  metric,  the  number  of  times  that  independently  sampled  values 
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of  the  corresponding  time  series  observed  by  the  actual  system  fall  outside  of  bounds  prescribed 
by  the  simulation.  The  simulation  boundaries  are  set  to  the  minimum  and  maximum  values  of  the 
Monte  Carlo  realizations  of  the  time  series  valid  at  each  time  index  of  the  observed  values.  We 
could  have  set  the  simulation  boundaries  to  n  standard  deviations  about  the  mean  of  the  Monte 
Carlo  realizations.  However,  this  approach  would  be  accurate  only  for  validation  metrics  that  have 
a  Gaussian  probability  density  function.  Unfortunately,  many  of  the  validation  metrics,  such  as 
the  total  position  error  listed  in  Table  1.  Section  4,  have  probability  density  functions  that  are 
significantly  different  from  the  Gaussian  PDF,  We  thus  opt  for  setting  the  simulation  bounds  to 
the  minimum  and  maximum  values  of  the  Monte  Carlo  realizations  in  lieu  of  setting  bounds  based 
on  explicitly  derived  PDFs. 

The  simulation  is  declared  to  be  inconsistent  with  actual  system  performance  if  the  number 
of  times  that  independent  samples  of  the  observed  time  series  fall  outside  of  the  simulation  bounds 
exceeds  a  pre-computed  rejection  threshold,  7.  Since  the  samples  are  chosen  to  be  statistically 
independent,  the  outcome  of  this  process  can  be  modeled  with  a  binomial  random  variable,  with 
cumulative  mass  function: 


Pr{.r<n}  =  J2 

k= 0 


A h 


Ai-p) 


Nt—k 


(1) 


The  form  of  the  cumulative  mass  function  depends  on  the  number,  A7*,  of  independently  sampled 
values  of  the  time  series  associated  with  the  validation  metric  under  scrutiny  and  on  the  probability. 
p.  that  a  single  sample  of  the  observed  time  series  falls  outside  of  the  simulation  bounds.  As  a 
result,  the  mapping  of  the  model  maker's  risk,  a,  to  the  rejection  threshold.  7.  also  depends  011 
these  parameters. 

At  the  time  of  simulation  validation,  the  number  of  Monte  Carlo  trials  is  invariant:  that  is,  the 
validation  agent  is  given  a  fixed  set  of  Monte  Carlo  realizations  of  the  time  series  of  the  validation 
metrics,  along  with  a  single  time  series  observed  by  the  actual  system.  From  the  number,  ArMC. 
of  Monte  Carlo  trials,  we  can  easily  show  that  the  probability,  p.  that  a  single  sample  of  the  time 
series  of  a  given  validation  metric  observed  by  the  ac  tual  system  falls  outside  of  bounds  prescribed 
by  the  Monte  Carlo  realizations  is  given  by  the  simple  expression 


p  = 


2 

Amc  +  1 


(2) 


In  order  to  judge  whether  the  entire  history  of  the  sampled  values  of  the  observed  time  series  are 
outside  of  the  simulation  bounds,  we  need  to  know  something  about  the  statistical  dependence  of 
those  samples.  I11  other  words,  we  must  account  for  any  temporal  correlations  present  in  the  time 
series  of  the  scrutinized  metrics  in  order  to  perform  a  meaningful,  fair,  and  robust  test.  In  the 
following  section,  we  give  a  detailed  account  of  the  impact  of  temporally  correlated  time  series  on 
the  simulation  validation  procedure.  However,  before  handling  correlated  time  series,  we  must  first 
address  an  apparent  concern  with  regard  to  a  simulation  validation  procedure  that  is  based  on 
bounds  prescribed  by  the  simulation  itself. 
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2.1  THE  LAZY  MODEL  MAKER’S  PARADOX 


One  might  be  tempted  to  think  that  the  passing  or  failing  of  a  given  modeling  and  simulation 
product  based  on  bounds  set  by  the  simulation  itself  would  allow  the  model  maker  to  devise1  a 
model  that  could  be  guaranteed  to  be  consistent  with  all  observations  at  all  times.  Consider  the 
following  gedankenexperiment.  Given  the  modeling  and  simulation  validation  procedure  outlined 
above,  a  lazy  model  maker,  in  an  attempt  to  guarantee  success,  may  naively  decade  to  broaden 
the  probability  density  functions  of  model  uncertainties  impacting  the  simulation  output.  This 
way.  the  time  series  of  the  validation  metrics  observed  by  the  actual  system  would  always  fall 
within  the  simulation  bounds — or  so  the  model  maker  hopes.  For  example,  in  the  case1  of  a  tracking 
radar,  in  order  to  avoid  the  explicit  modeling  of  unanticipated  systematic  errors,  such  as  temporally 
correlated  measurement  errors  induced  by  the  random  heaving  and  tilting  motion  experienced  by 
a  shipboard  radar,  the  model  maker  may  simply  decide  to  reduce  the  signal-to-noise  ratio  driving 
the  measurement  error  variances.  This  way,  temporally  varying  biases  would  be  buried  in  noise, 
and  the  model,  in  a  way,  is  guaranteed  to  be  accepted  by  the  validation  agent.  However,  by  doing 
so.  if  the  model  maker  is  also  the  equipment  maker,  he  or  she  would  be  admitting  that  lib  or 
her  equipment's  performance  is  at  best  subpar.  After  all.  a  radar  advertised  through  behavior 
demonstrated  via  modeling  and  simulation  as  having  a  poor  signal-to-noise  ratio  would  not  be  a 
desirable  item  to  own.  It  follows  that  such  a  strategy  would  prove  unwise  if  the  model  maker  is 
also  the  equipment  maker  who  wishes  to  sell  the  equipment.  I  Inis,  the  model  maker  has  no  choice 
but  to  properly  account  for  all  sources  of  errors,  including  time- varying  biases. 


This  pa^e  intentionally  left  blank. 


3.  CORRELATED  TIME  SERIES 


The  number.  Ar?.  of  independent  samples  in  the  time  series  of  a  validation  metric  plays  an 
important  role  in  the  decision  algorithm  presented  in  the  previous  section.  It  can  be  estimated 
approximately  by  dividing  the  total  duration,  T.  of  the  time  series  by  the  correlation  time*,  r.  of 
the  time  scries: 

T 

Ni  *  (3) 

In  other  words,  if  we  resample  the  time  series  at  a  rate  of  approximately  1/r.  then  the  Ar,  resulting 
samples  are  statistically  independent,  The  correlation  time.  r.  can  be  obtained  by  employing  any 
of  the  classical  techniques  discussed  in  the  vast  literature  on  time  series  analysis.  For  example,  wo 
could  estimate  r  from  the  autocorrelation  function  of  the  time  series.  The  correlation  time  would 
then  correspond  to  the  point  in  time  when  the  autocorrelation  function  falls,  say.  to  l/<  times 
its  maximum  value  at  zero  delay.  Alternatively,  we  could  estimate  the  correlation  time  from  the 
power  spectral  density  (PSD)  of  the  time  series,  which  is  defined  as  the  Fourier  transform  of  the 
autocorrelation  function.  In  that  case,  the  correlation  time.  r.  would  correspond  to  the  inverse  of 
an  appropriately  defined  uroll-off  frequency/*  v  =  1/r,  of  the  PSD.  What  if  there  are  more  than 
a  single  correlation  time?  In  that  case,  we  would  resample  the  time  series  at  a  rate  equal  to  one 
over  the  longest  correlation  time.  That  way,  the  resulting  samples  are  guaranteed  to  be  statistically 
independent. 

For  a  stochastic*  time  series,  y.\  it  can  be  shown  that  the  PSD  can  be  obtained  directly  from 
the  Fourier  transform.  T.  of  the  time  series  through  the  relation  [8]: 


PSD(/)  = 


lim  E 
T-oc  T 


l*(/)f 


0) 


where  f  is  the  frequency,  arid  E(-)  denotes  the  expected  value  of  (•).  In  the  case  of  uniform  sampling, 
the  Fourier  integral  (‘an  be  approximated  by  the  discrete  Fourier  transform  (DFT)  [9]: 


/vi 

#(/„)  -  A  T  4’(4)ex p 


A‘=0 
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N 
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(5) 


where  A  denotes  the  uniform  sampling  interval,  and  A  is  the  total  number  of  samples.  The  DFT  for 
a  uniformly  sampled  time  series  can  be  implemented  using  the  efficient  fast  Fourier  transform  (FFT) 
algorithm  [9].  Hence,  it  follows  that,  for  large  T.  we  (‘an  approximate  the  PSD  by 


PSD(/„)  ~  — 


N—  1 

^  tvo  «*xp 
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For  non- uniformly  sampled  time  series,  more  sophisticated  techniques,  such  as  the  Lonib  peri- 
odogram  method  [10].  must  be  considered  for  the  estimation  of  the  PSD, 

For  illustration,  we  consider  a  time  series  prescribed  by  a  first-order  Gauss  Markov  pro¬ 
cess  [11].  The  theoretical  expression  for  the  PSD  of  a  first-order  Gauss  Markov  process  with 
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Figure  1  A  realization  of  a  first-order  Gauss  Markov  time  series ,  0,  with  standard  deviation  Gqm  = 
250  units  and  correlation  time  tc,m  —  100  s,  and  its  power  spectral  density  (PSD).  The  smooth  curve  in  the 
bottom  panel  is  obtained  from  the  theoretical  expression  for  the  PSD  of  a  first-order  Gauss  Markov  pr  ocess 
with  similar  parameters.  The  dashed  vertical  line  indicates  the  "roll-off  frequency"  of  the  PSD ,  corresponding 
to  —  0.01  Hz. 


standard  deviation  aGM  and  correlation  time  tgm  =  1/VGM  is  given  by  [11] 


PSDgm(/)  = 


Vgm 

*  if2  +  v‘L 


r 


(7) 


A  realization,  0,  of  the  first-order  Gauss  Markov  time  series  with  <rGM  =  250  units  and  rGM  =  10  s 
is  shown  in  the  top  panel  of  Figure  1.  The  time  series,  0,  can  represent  any  validation  metric.  Thus, 
for  convenience,  we  have  chosen  0  to  be  dimensionless.  The  PSD  computed  directly  from  the  time 
series,  using  Eq.  (6),  along  with  the  theoretical  PSD  computed  from  Eq.  (7)  is  shown  in  the  bottom 
panel  of  Figure  1.  The  theoretical  value  of  the  inverse  of  the  correlation  time,  jzgm  =  0.01  Hz,  is 
shown  with  the  dashed  vertical  line.  It  follows  that  if  we  resample  the  time  series  shown  in  the 
top  panel  of  Figure  1  at  a  rate  of  0.01  Hz,  then  the  resulting  sequence  will  correspond  to  a  white 
Gaussian  noise  sequence. 


To  demonstrate  how  temporal  correlations  can  be  taken  into  account,  we  consider  the  hy¬ 
pothetical  scenario  depicted  in  Figure  2.  Here,  we  are  given  a  set  of  ArMC  =  10  Monte  Carlo 
realizations,  shown  in  black,  of  the  time  series  of  a  generic  validation  metric,  0,  described  by  a 
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Figure  2.  Ten  Monte  Carlo  realizations  of  the  time  series  of  a  generic  validation  metric.  it\  shown  in 
black .  and  a  single  time  series  of  the  validation  metric  observed.  by  the  actual  system ,  shown  in  blue.  The 
simulated  and  the  observed  time  series  are  described  by  the  same  first-order  Gauss  Markov  process,  with 
standard  deviation  a  cm  250  units  and  correlation  tune  Tqm  =  100  s.  The  simulation  bounds,  prescribed 
by  the  minimum  and  maximum  values  of  the  10  Monte  Carlo  idealizations .  are  shown  in  red.  The  dashed 
vertical  lines  separate  the  so-called  correlation  segments , 


first-order  Gauss  Markov  process  with  aGM  =  250  units  and  rGM  =  100  s.  Furthermore,  we  con¬ 
sider  the  case  when  the  time  series  observed  by  the  actual  system,  shown  in  blue  in  Figure  2.  is 
described  by  the  same  first-order  Gauss  Markov  process.  In  other  words,  the  null  hypothesis.  Hq, 
corresponding  to  the  hypothesis  that  the  simulation  is  consistent  with  actual  system  performance, 
is  the  true  hypothesis.  Thus,  for  our  example,  we  have  simply  generated  jVMC  +  1  =  11  Monte 
Carlo  realizations  of  the  same  first-order  Gauss  Markov  process  and  have  arbitrarily  labeled  one  of 
them  as  the  time  series  observed  by  the  actual  system.  However,  due  to  the  random  nature  of  the 
problem,  there  is  always  a  chance  that  we  may  decide  that  the  alternative  hypothesis.  H \.  corre¬ 
sponding  to  the  hypothesis  that  the  simulation  is  inconsistent  with  actual  system  performance,  is 
the  true  hypothesis.  In  that  ease,  we  would  reject  a  valid  simulation.  In  binary  hypothesis  testing, 
this  type  of  error  is  referred  to  as  the  Type  I  error,  and  its  probability,  referred  to  as  the  model 
maker’s  risk.  n.  in  the  modeling  and  simulation  literature,  serves  as  a  parameter  of  the  decision 
algorithm. 

In  order  to  determine  the  number.  A'?,  of  independent  samples,  we  must  first  estimate  the 
correlation  time.  r.  of  the  time  series,  rfj.  Subsequently,  the  number  of  independent  samples  can  be 
obtained  from  Eq.  (3).  Since  the  validation  process  is  anchored  to  the  simulation,  the  correlation 
time.  r.  ought  to  be  computed  from  the  simulated  time  series,  instead  of  from  the  time  series 
observed  by  the  actual  system.  We  can  estimate  the  correlation  time.  r.  by  using  any  of  the 
techniques  discussed  earlier.  Since  the  simulated  time  series  are  drawn  from  the  same  probability 
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distribution  function,  we  can  reduce  the  error  in  estimating  the  correlation  time  by  combining 
results  obtained  from  the  independent  processing  of  the  individual  time  series.  For  example,  if  we 
decided  to  estimate  the  correlation  time,  r,  from  the  PSD.  then  the  PSDs  computed  separately 
for  each  of  the  NMC  =  10  Monte  Carlo  realizations  shown  in  Figure  2  say  using  Eq.  (6)  can  be 
averaged  to  result  in  a  smoother  estimate  of  the  PSD,  thereby  reducing  the  error  in  estimating  the 
roll-off  frequency,  v  ~  1/r. 

Once  we  have  estimated  the  correlation  time,  r,  from  the  simulated  time  series,  we  can  divide 
the  time  interval  including  both  the  simulated  and  the  observed  time  series  into  so-called  correlation 
segments,  each  of  duration  r.  The  correlation  segments  are  shown  separated  with  dashed  vertical 
lines  in  Figure  2.  For  the  simulated  time  series,  the  samples  within  each  segment  are  correlated, 
whereas  samples  selected  from  different  segments  and  separated  by  at  least  one  correlation  time, 
r,  are  statistically  independent.  If  the  true  hypothesis  is  the  null  hypothesis,  Ho,  corresponding 
to  the  hypothesis  that  the  simulation  is  consistent  with  actual  system  performance,  then  the  same 
behavior  is  obtained  for  the  observed  time  series.  In  other  words,  in  case  of  Hq  being  the  true 
hypothesis,  if.  for  any  time  series  (whether  simulated  or  observed),  we  pick  a  sample  from  each 
correlation  segment,  then,  as  long  as  the  selected  samples  are  as  least  one  correlation  time  apart, 
the  resulting  sequence  will  be  a  white  Gaussian  noise  sequence. 

The  simulation  bounds,  prescribed  by  the  minimum  and  maximum  values  of  10  Monte  Carlo 
realizations,  are  shown  in  red  in  Figure  2.  Given  a  sequence  of  r- separated  samples  of  the  observed 
time  series  and  simulation  bounds  valid  at  the  times  of  the  selected  samples,  we  count  the  number 
of  times  the  samples  fall  outside  of  those  bounds.  Next,  we  compare  the  number  of  samples 
that  fall  outside  of  the  simulation  bounds  with  a  rejection  threshold.  7.  The  rejection  threshold  is 
determined  from  the  number,  iV?,  of  independent  samples;  the  probability,  p,  of  an  arbitrary  sample 
falling  outside  of  the  simulation  bounds;  and  the  model  maker’s  risk,  a.  If  the  number  of  r-separated 
samples  falling  outside  of  the  simulation  bounds  is  smaller  than  the  rejection  threshold,  7,  then  we 
declare  the  simulation  to  be  consistent  with  actual  system  performance.  If  the  true  hypothesis  is 
the  alternative  hypothesis,  H\.  then  there  is  no  reason  to  expect  the  observed  time  series  to  behave 
in  a  way  predicted  by  the  simulation.  Specifically,  resampling  the  observed  time  series  at  a  rate 
of  1/r.  where  the  correlation  time,  r,  is  estimated  from  the  simulated  time  series,  may  not  result 
in  an  uncorrelated  sequence.  However,  this  discrepancy  can  only  add  to  the  inconsistency  between 
the  simulation  and  actual  system  performance  and  would  therefore  not  degrade  the  performance 
of  the  decision  algorithm. 

The  choice  of  which  sample  to  pick  as  the  starting  point  of  the  resampling  process  is  somewhat 
arbitrary.  For  instance,  we  could  choose  to  always  pick  the  first  sample  in  each  of  the  correlation 
segments  shown  in  Figure  2.  Alternatively,  we  could  haven  chosen  to  always  pick  the  second  sample 
in  each  of  the  correlation  segments,  or  the  third  sample,  and  so  on.  Once  we  have  committed  to  a 
particular  starting  point,  we  may  begin  to  wonder  whether  the  ignored  samples  might  have  afforded 
any  further  utility.  Also,  what  if  our  chosen  starting  point  happens  to  produce  a  sequence  that 
corresponds  to  a  statistical  outlier,  thereby  skewing  the  validation  process,  whereas  had  we  chosen 
another  starting  point,  might  we  have  obtained  a  more  normative  sequence?  One  way  to  remedy 
such  quandaries  would  be  to  consider  all  possible  starting  points:  resampling  the  observed  time 
series  by  picking  the  first  sample  in  each  correlation  segment,  followed  by  resampling  the  observed 
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time  series  by  picking  the  second  sample  in  each  correlation  segment,  etc.,  until  we  reached  the  end 
of  the  correlation  segments.  We  could  then  report  our  result  based  on  an  average  of  all  possible 
cases,  along  with  an  appropriate  confidence  interval.  While  such  an  approach  might  present  a 
viable  solution,  we  opt  for  a  simpler  procedure. 


Instead  of  partitioning  the  data  window  into  correlation  segments  and  subsequently  picking 
r-separated  samples  of  the  observed  time  series  from  each  segment,  we  consider  all  samples  instead: 
that  is.  we  choose  to  ignore  any  correlation  that  may  exist  between  the  samples.  Specifically,  we 
examine  whether  any  of  all  samples  fall  inside  or  outside  of  the  simulation  bounds.  Of  course,  by 
doing  so.  we  would  introduce  an  error,  since  the  invocation  of  the  binomial  probability  distribut  ion 
function  requires  the  samples  to  be  statistically  independent.  However,  if  correlation  effects  are 
taken  into  account  in  the  computation  of  the  rejection  threshold,  7,  then,  we  argue,  the  effect  of 
this  error  on  the  validation  process  will  be  innocuous.  I11  other  words,  we  would  reach  the  same 
decision  on  the  validity  of  a  simulation  had  we  incorporated  the  correlation  effects  in  the  averaging 
process  discussed  above.  This  method  lias  the  advantage  of  using  all  the  available  data  without 
the  need  to  resort  to  any  complicated  counting  procedure  or  averaging.  We  therefore  regard  it  as 
more  practical. 


We  illustrate  the  concept  by  employing  a  first-order  Gauss  Markov  process  modeling  the  time 
series  of  a  generic  validation  metric4.  All  time  series  contain  N  =  1024  samples  and  are  sampled 
uniformly  at  a  rate  of  1  Hz  thus  T  =  1024  s.  The  correlation  time  is  increased  from  tgm  =  1  s 
to  rf;NI  =  1024  s  in  factors  of  2.  In  other  words,  we  start  with  a  time  series  that  can  be  regarded 
as  a  white  Gaussian  noise  sequence,  and  we  end  with  a  time  series  that  is  more  or  less  completely 
correlated,  with  correlation  time  equal  to  the  duration  of  the  time  series.  T.  The  standard  deviation. 
7;m  =  1  unit,  is  the  same  for  all  time  series.  We  consider  the  following  numerical  experiment.  For 
each  correlation  time,  rc?M,  we  are  given  a  set  of  ArMC  Monte  Carlo  realizations  of  time  series 
representing  simulation  results.  We  use  these  realizations  to  compute  the  simulation  bounds.  Also, 
from  the  number  of  Monte  Carlo  trials.  ArMC,  we  compute  the  probability  p  given  in  Eq.  (2).  Next, 
we  generate'  a  set  of  1000  first-order  Gauss  Markov  time  series  representing  results  observed  by  the 
actual  system.  The  set  of  1000  time  series  have  the  same  erc;M  and  t(:m  as  the  simulation;  in  other 
words,  the  true  hypothesis  is  the  null  hypothesis,  //().  For  each  of  the  1000  time  series,  we  compute' 
a  so-called  rejection  index,  p: 


A  ( 

p  ~  N 


x  100. 


(«) 


where  Aollt  is  the  number  of  observed  samples  that  fall  outside  of  the  simulation  bounds.  It  follows 
that  0  <  p  <  100.  We  note  again  that  the  rejection  index,  p,  is  computed  using  all  available' 
samples.  We  re'peat  this  process  for  each  of  the  different  correlation  times,  so,  for  each  correlation 
time.  7gm,  we  compute  1000  rejection  indices. 

Results  are  summarized  in  Figure  3.  The  vertical  line's  indicate  the  range  e>f  values  e>f  p 
obtained  over  the  course  of  1000  observations.  The  horizontal  dashes  above1  anel  below  the'  lines 
indicate  the  maximum  anel  minimum  values  of  p.  respectively,  obtaineel  over  the  course  e>f  1000 
observations,  while  the  dots  represent  their  averages.  Results  shown  in  the  top  panel  of  Figure  3 
correspond  to  the  case  when  there  are  jVMr  =  10  Monte  Carlo  realizations  of  the  simulated  time 
series  available,  while  results  shown  in  the  bottom  panel  correspond  to  the  case  of  ArMr  —  50. 
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Values  of  p  x  100  for  the  two  scenarios  are  represented  by  the  blue  horizontal  lines.  In  the  case  of 
Nmc  —  10.  p  x  100  —  18,  while  in  the  case  of  ATMC  =  50,  p  x  100  ~  4.  We  note  that,  as  expected, 
the  average  values  of  p  match  the  values  of  ])  x  100.  Since  we  have  ignored  any  correlation  effects  in 
computing  the  rejection  index,  we  also  note  that  the  range  of  values  obtained  for  p  over  the  course 
of  1000  observations  increases  with  increasing  correlation  time,  rGM.  In  the  case  of  rGM  =  1  s. 
the  underlying  time  series  arc  effectively  uncorrelated,  with  the  correlation  time  being  equal  to 
the  sampling  interval.  Hence,  for  rGX1  =  1  s,  p  deviates  only  slightly  around  the  mean  value  of 
p  x  100.  However,  as  the  correlation  time  increases,  this  deviation  increases.  As  discussed  earlier, 
this  deviation  will  have  no  impact  on  the  validation  process  if  correlation  effects  are  taken  into 
account  in  the  computation  of  the  rejection  threshold.  7. 


Similar  to  the  notion  of  a  rejection  index,  p,  defined  in  Eq.  (8).  we  define  a  normalized  rejection 
threshold,  7: 


l(p,Ni)±l{p^Ni)  x  100. 

iVj 


(9) 


The  rejection  threshold,  7  is  obtained  from 

rOO 

«=/  fwmomiAs;P,Ni)ds,  (10) 

J  7 

where  /binomial  denotes  the  probability  distribution  function  of  a  binomial  random  variable  with 
parameters  p  and  Np 


/binomial  P'  Nt)  =  W  /  (l  -  p)Ni’kS(s  ~  k),  (11) 

where  ^(*)  is  the  Dirac  delta  function.  The  number,  Nz,  of  independent  samples  (‘an  be  obtained 
from  an  appropriate  estimate  of  the  correlation  time  through  Eq.  (3).  For  our  numerical  experiment, 
the  values  of  7  computed  from  Eq.  (9),  corresponding  to  a  =  0.01,  are  shown  in  red  in  Figure  3. 
It  is  evident  that  the  large  deviations  in  p,  due  mainly  to  ignoring  correlation  effects,  will  have  no 
impact  011  the  simulation  validation  process  as  long  as  the  correlation  effect  is  taken  into  account 
in  the  computation  of  the  rejection  threshold. 

From  the  results  shown  in  Figure  3.  we  note  that  as  the  correlation  time  increases,  the 
normalized  rejection  threshold,  7,  increases  in  a  way  similar  to  the  increase  observed  in  the  range 
of  values  covered  bv  the  rejection  index,  p.  over  the  set  of  1000  observations.  Also,  by  comparing  the 
results  shown  in  the  top  and  bottom  panels  of  Figure  3,  corresponding  to  ArMC  =  10  and  ArMC  =  50, 
respectively,  we  note  that  the  magnitude  of  the  increase  in  both  7  and  the  range  of  values  covered 
by  p  decreases  with  (1)  increasing  number  of  Monte  Carlo  realizations  of  the  simulated  time  series 
and  (2)  decreasing  correlation  time.  For  example,  in  the  case  of  the  correlation  time  being  equal  to 
one  quarter  of  the  duration  of  the  time  series,  corresponding  to  the  case  of  rGM  =  256  s  in  Figure  3, 
the  value  of  the  normalized  threshold,  7,  for  ArMC  =  50  is  roughly  one  third  that  for  AfMc  —  10. 
Similarly,  we  would  expect  a  lower  value  for  7  if  the  duration  of  the  time  series  were  longer. 
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Figure  3.  Vernation  of  the  rejection  index ,  p,  and  the  normalized  rejection  threshold ,  7,  with  correlation 
time ,  r.  The  top  panel  shows  results  corresponding  to  the  ease  when  there  are  N =  10  Monte  Carlo 
realizations  of  the  Simulated  time  series ,  while  the  bottom  panel  shows  results  corresponding  to  the  ease  of 
Nmc  ~  50.  Both  the  Simulated  and  observed  time  senes  are  modeled  as  first-order  Gauss  Markov  processes 
with  matching  parameters:  in  other  words,  the  true  hypothesis  is  the  null  hypothesis ,  //q,  corresponding  to 
the  hypothesis  that  the  simulation  is  consistent  with  actual  system  performance.  The  normalized  rejection 
threshold,  7,  corresponding  to  a  —  0.01 1  is  shown  in  red.  The  vertical  lines  indicate  the  range  of  values  of  p 
obtained  over  the  course  of  1000  observations.  The  horizontal  dashes  above  and  below  the  lines  indicate  the 
maximum  and,  minimum  values  of  p.  respectively,  obtained  over  the  course  of  1000  observations,  while  tin 
dots  represent  their  average  The  blue  horizontal  lines  correspond  to  p  x  100,  where  p  is  given  by  Eq.  (2). 
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3.1  HOW  MANY  MONTE  CARLO  REALIZATIONS? 


A  question  often  asked  is:  ‘'How  many  Monte  Carlo  realizations  are  sufficient  to  validate  a 
given  modeling  arid  simulation  product  using  the  proposed  method  in  this  report?”  The  frank 
answer  is:  uIt  depends.”  In  general,  it  is  not  possible  to  promulgate  a  single  number,  ArMC.  of 
Monte  Carlo  realizations  as  a  gold  standard  universally  applicable  to  all  simulations.  As  we  saw 
in  the  discussion  of  correlated  time  series  above,  the  duration.  T,  and  the  correlation  time,  r, 
of  the  time  series  of  the  validation  metrics  play  key  roles  in  coming  up  with  acceptable  rejection 
thresholds,  7.  As  we  saw  in  Figure  3,  as  the  correlation  time,  r,  of  a  given  time  series  becomes  larger 
(or  equivalently,  as  the  duration,  T.  of  the  time  series  becomes  smaller),  the  normalized  rejection 
threshold,  7,  may  become  excessively  large.  In  other  words,  as  r  becomes  larger  (or  as  T  becomes 
smaller),  we  require  more  and  more  of  the  observed  samples  to  fall  within  the  simulation  bounds. 
This  may  be  overly  conservative,  thus  reducing  the  fidelity  of  the  validation  process.  We  need 
more  information  say  by  observing  the  time  series  of  a  given  validation  metric  for  a  longer  period 
of  time — to  make  a  more  accurate  assessment.  Of  course,  we  do  not  always  possess  the  luxury  of 
observing  a  time  series  as  long  as  we  desire.  For  example,  in  the  case  of  tracking  radars,  the  tracked 
target  might  exit  the  radar’s  field  of  view  before  sufficient  information  has  been  gathered. 

By  comparing  the  two  plots  in  Figure  3,  we  note  that  as  we  increase  the  number,  AMC, 
of  Monte  Carlo  realizations  from  10  (top  panel)  to  50  (bottom  panel),  the  normalized  rejection 
threshold,  7.  becomes  smaller.  Hence,  for  short  time  series  (or  time  series  with  large  correlation 
times),  we  may  wish  to  obtain  a  larger  number  of  Monte  Carlo  realizations.  Of  course,  there  is  a 
limit  to  this  strategy.  For  example,  if  the  particular  geometry  of  a  radar  tracking  scenario  happens 
to  impose  a  fundamental  limit  to  the  amount  of  information  extractable  for  the  purposes  of  modeling 
and  simulation  validation,  then  the  availability  of  a  larger  number  of  Monte  Carlo  realizations  would 
not  necessarily  increase  the  fidelity  of  the  validation  process.  Under  these  circumstances,  we  ought 
to  instead  reject  the  observation  as  a  reliable  anchoring  point  for  the  validation  of  a  given  modeling 
and  simulation  product. 

The  relationship  between  the  number,  ArMC,  of  Monte  Carlo  realization,  the  duration,  Tf 
and  the  correlation  time.  t.  the  rejection  threshold,  7.  and  the  model  maker’s  risk.  a.  is  given  by 
Eq.  (10).  which  we  write  more  explicitly  as 


[ 00  ( 

2  T\ 

^  /  ./binomial  l 

S"  -/VmC  +  V  T  ) 

For  a  =  0.01,  plots  of  the  normalized  rejection  threshold,  7,  versus  the  number,  Ar7  =  T/r, 
independent  samples  for  10.  25,  50.  75.  and  100  Monte  Carlo  realizations  are  shown  in  Figure  4.  For 
7  =  20%,  plots  of  the  model  maker's  risk.  a.  versus  the  number,  ATMC?  of  Monte  Carlo  realizations 
for  1.  5,  10,  25.  50,  75,  and  100  independent  samples  are  shown  in  Figure  5.  In  both  figures,  the 
jaggedness  in  the  curves  is  due  to  the  discrete  nature  of  the  problem,  which  is  revealed  by  the  Dirac 
delta  function  in  Eq.  (11).  The  plots  in  Figures  4  and  5  reveal  the  monotonic  relations  that  exist 
between  a.  7,  Nj.  and  ArMC.  I11  general,  such  plots  ought  to  be  used  to  determine  the  appropriate 
number  of  Monte  Carlo  realizations  and  to  assess  the  efficacy  of  a  particular  observation  used  as  a 
modeling  and  simulation  validation  anchor.  For  example,  as  seen  in  Figure  5.  the  model  maker’s 
risk,  a,  decreases  with  increasing  number,  ATMC,  of  Monte  Carlo  realizations.  The  decrease  in  a 
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Figure  4  Variation  of  the  normalized  rejection  threshold ,  7.  with  the  number  of  independent  samples  for 
IQ,  25s  50s  75s  and  100  Monte  Carlo  realizations .  For  all  plots ,  a  —  0.01.  The  jaggedness  in  the  cun'es  is 
due  to  the  discrete  nature  of  the  problem,  which  is  revealed  by  the  Dirac  delta  function  in  Eq.  (11). 


Number  of  Monte  Carlo  Realizations  (Dimensionless) 


Figure  5.  Variation  of  the  model  maker’s  risk,  a ,  with  the  number  of  Monte  Carlo  realizations  for  1 .  5,  10, 
25,  50.  75.  and  100  independent  samples.  For  all  plots.  7  =  20%.  The  jaggedness  in  the  runes  is  due  to  tin 
discrete  nature  of  the  problem,  which  is  revealed  by  the  Dirac  delta  function  in  Eq.  (11). 
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becomes  more  pronounced  as  the  number  of  independent  samples  increases.  As  revealed  in  both 
figures,  as  the  number  of  independent  samples  becomes  smaller,  an  increase  in  the  number.  NMC,  of 
Monte  Carlo  realizations  does  not  necessarily  increase  the  efficacy  of  the  validation  process.  Under 
such  circumstances,  validation  results  would  be  inconclusive. 
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4.  ACCEPTABILITY  CRITERIA 


In  this  section,  appropriate  validation  metrics  necessary  for  validating  the  modeling  and 
simulation  of  a  generic  tracking  radar  are  identified.  A  list  of  2(i  validation  metrics  is  given  in 
Table  1.  All  validation  metrics  are  functions  of  time.  For  convenience,  the  time  dependence  of 
the  validation  metrics  lias  been  suppressed  in  the  notation.  We  have  assumed  a  phased-array 
radar,  with  the  measurement  space  consisting  of  the  range,  r,  to  the  target  and  the  two  orthogonal 
direction  cosines  u  and  v.  The  validation  metrics  can  be  easily  modified  to  accommodate  other 
types  of  radar  such  as  dish  radars.  The  target  state  estimation  problem  has  been  limited  to  the 
ease  when  the  position  and  velocity  vectors  are  sufficient  to  characterize  t lie  dynamics  of  the  target. 
The  list  in  Table  1  can  be  extended  to  accommodate  for  larger  dimensional  state  vectors  as  needed. 

The  validation  metrics  listed  in  Table  1  can  be  divided  into  three  broad  categories: 


1.  Macro  tracker  output  validation  metrics:  Items  1  through  (i  in  Table  1  are  sufficient 
to  characterize  the  general  behavior  of  the  tracker  output.  These  validation  metrics  are 
particularly  useful  within  a  multiple  sensor  configuration  where  track  data  are  shared  among 
the  sensors.  The  total  position  and  velocity  estimation  errors  valid  at  time  index  A  are  given 
by 


\fck\k 

<*VA-|  k 


=  -  rk  and 

=  n.|  k  -  v*. 


(13) 


where  rkk  and  vkk  correspond  to  the  updated  target  position  and  velocity  vector  estimates 
valid  at  time  index  A\  respectively,  while  rk  and  v*  correspond  to  the  true  target  position 
and  velocity  vectors  valid  at  time  index  A:,  respectively.  The  square  roots  of  the  traces 
of  the  position  and  velocity  quadrants  of  the  error  covariance  matrix  provide  estimates  of 
the  size  of  the  error  hyper-ellipsoid.  A  characterization  of  the  shape  and  orientation  of  the 
error  hyper-ellipsoid,  in  turn,  can  be  obtained  from  the  so-called  normalized  estimation  error 
squared  (NEES)  [1]: 


NEESa-  = 


i  T 


-  XA 


A  |  A- 


XA \k  -  XA 


(14) 


where  x*  k  is  the  updated  target  state  estimate  valid  at  time  index  k:  xk  is  the  true  target 
state  valid  at  time  index  k:  and  I\  k  is  the  updated  state  estimation  error  covariance  valid  at 
time  index  A*.  The  normalized  innovation  squared  (NIS)  provides  another  useful  validation 
metric  [l] : 


NIS*  - 


1 T  /  p  \  —  1  r 

i)J  [Hkpk \k-\Hk  +  Rk J  |zA.  -  hjt(xA.|A. 


(ir») 


where  zk  is  the  nieasureinent  vector  valid  at  time  index  k:  xk\k^\  is  the  predicted  target  state' 
estimate  valid  at  time  index  k:  hA-(-)  is  the  measurement  function  valid  at  time  index  k: 
/?a  is  the  measurement  error  covariance  valid  at  time  index  k\  Pk\k-\  is  the  predicted  state1 
estimation  error  covariance  valid  at  time  index  k:  and  H k  is  the  sensitivity  matrix  valid  at 
time  index  A*. 
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2.  Micro  tracker  output  validation  metrics:  Items  7  through  18  in  Table  1  provide  a 
more  detailed  characterization  of  the  behavior  of  the  tracker  output.  Similar  to  Eq.  (13),  the 
components  of  the  target  state  estimation  error  vector,  dxr ,  dxu,  dxv,  dxf ,  dx and  dxy.  are 
defined  as  the  difference  between  the  estimated  and  true  values.  In  addition  to  the  diagonal 
elements  of  the  state  estimation  error  covariance  matrix,  the  cross-diagonal  elements  can  also 
be  considered.  These  are  of  value  particularly  when  the  detailed  shape  and  orientation  of 
the  error  hyper-ellipsoid  are  of  concern.  An  inspection  of  the  micro  tracker  output  validation 
metrics  can  potentially  aid  in  limiting  the  possible  sources  of  simulation  inconsistency  revealed 
by  the  macro  tracker  output  validation  metrics. 

3.  Macro  radar  front-end  output  validation  metrics:  Items  19  through  26  characterize 
the  behavior  of  the  tracker  input.  Many  of  the  inconsistencies  observed  in  the  tracker  output 
metrics  can  be  traced  back  to  the  tracker  input.  Thus,  while  not  strictly  necessary  for 
validating  the  modeling  and  simulation  of  a  given  tracking  radar,  the  macro  radar  front-end 
output  validation  metrics  often  provide  invaluable  diagnostics  as  to  the  cause  of  the  observed 
inconsistencies. 


TABLE  1:  Validation  Metrics 


Item 

Validation  Metric 

Symbol 

Code  Name 

1 

Total  Position  Estimation  Error 

11*11 

pe 

2 

Total  Velocity  Estimation  Error 

IIHI 

ve 

3 

Square  Root  of  the  Trace  of  the  Upper-Left  3x3  Quadrant 
of  the  State  Estimation  Error  Covariance  Matrix  Corre¬ 
sponding  to  the  Variance  of  the  Total  Position  Estimation 
Error 

%/tr  [P(l  :  3, 1  :  3)] 

tp 

4 

Square  Root  of  the  Trace  of  the  Lower-Right  3  x  3  Quad¬ 
rant  of  the  State  Estimation  Error  Covariance  Matrix 
Corresponding  to  the  Variance  of  the  Total  Velocity  Es¬ 
timation  Error 

v/tr  [P(4  :  6,4  :  6)] 

tv 

5 

Normalized  Estimation  Error  Squared  (NEES) 

NEES 

NEES 

6 

Normalized  Innovation  Squared  (NIS) 

NIS 

NIS 

7 

Range  Estimation  Error 

Sxr 

dxl 

8 

u  Estimation  Error 

SxH 

dx2 

9 

v  Estimation  Error 

Sxv 

dx3 

10 

Range  Rate  Estimation  Error 

Sxf 

dx4 

11 

u  Estimation  Error 

dxu 

dx5 

continued  on  next  page 
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Validation  Metric 

Symboi 

Code  Name 

12 

v  Estimation  Error 

Sj'i* 

dx6 

K5 

Standard  Deviation  of  the  Range  Estimation  Error 

VPrr 

sxl 

14 

Standard  Deviation  of  the  u  Estimation  Error 

s/Puu 

sx2 

15 

Standard  Deviation  of  the  v  Estimation  Error 

s/Prr 

sx3 

Hi 

Standard  Deviation  of  the  Range  Rate  Estimation  Error 

s/Prf 

sx4 

17 

Standard  Deviation  of  the  it  Estimation  Error 

s/Ku 

sx5 

IS 

Standard  Deviation  of  the  v  Estimation  Error 

sx6 

19 

Range  Measurement  Error 

Szr 

dzl 

20 

u  Measurement  Error 

6zu 

dz2 

21 

v  Measurement  Error 

6zv 

dz3 

22 

Standard  Deviation  of  the  Range  Measurement  Error 

<Tr 

szl 

24 

Standard  Deviation  of  the  u  Measurement  Error 

(*U 

sz2 

24 

Standard  Deviation  of  the  v  Measurement  Error 

(Tv 

sz3 

25 

Measured  Signal-to-Noise  Ratio 

SNR 

SNR 

26 

Measured  Target  Radar  Cross-Section 

RCS 

RCS 

Using  the  validation  procedure  discussed  in  Sections  2  and  3,  results  for  a  given  modeling 
and  simulation  product  are  summarized  in  a  so-called  scorecard.  The  scorecard  contains  a  listing 
of  the  rejection  indices,  p.  and  normalized  rejection  thresholds.  7.  for  the  validation  metrics  listed 
in  Table  1  It  is  evident  that  many  of  the  validation  metrics  in  Table  1  are  dependent  011  one 
another.  For  example,  all  validation  metrics  corresponding  to  the  measurement  error  and  state 
estimation  error  covariances  are  dependent  011  the  signal-to-noise  ratio.  By  presenting  the  results 
in  the  form  of  a  scorecard,  correlations  among  the  validation  metrics  become  immediately  apparent; 
thus,  the  scorecard  can  additionally  serve  as  a  diagnostic  tool.  By  considering  rejection  thresholds 
corresponding  to  different  values  of  the  model  maker's  risk.  a.  and  by  noting  the  cross-correlation 
between  select  validation  metrics,  a  validation  agent  can  use  a  scorecard  to  declare  a  given  modeling 
and  simulation  product  as  valid  or  invalid.  Using  numerical  examples,  we  examine  the  utility  of 
such  scorecards  in  the  next  section. 
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5.  CASE  STUDY 


In  this  section,  wo  devise  a  controlled  numerical  experiment  to  examine  the  effectiveness  of 
the  proposed  validation  algorithm.  We  choose  a  satellite  as  the  target  and  use  a  phased-array 
radar  to  track  the  satellite.  The  measurement  space  consists  of  the  range,  r.  to  the  target  and  the 
two  orthogonal  direction  cosines  u  and  v.  We  model  the  target  radar  cross-section  (RCS)  as  an 
independent  and  identically  log-normal  distributed  stochastic  process  with  mean  /jrcs  and  variance 


The  signal-tonoise  ratio  (SNR)  valid  at  time  index  k  can  be  obtained  from  [12] 

RoX'  RCSa. 

rk  )  1  HI2  ‘ 


SNR*.  = 


(16) 


where  /?o  denotes  the  distance  to  a  perfectly  conducting  sphere  with  a  cross-sectional  area  of 
1  nr  at  which  the  SNR  is  0  dB.  Here.  /?<>  encompasses  contributions  from  the  radar  receiver  and 
transmitter  functions,  along  with  the  relevant  losses  that  appear  in  the  radar  range  equation  [Id]. 
The  measurement  error  variances  induced  by  the  receiver  thermal  noise  are  functions  of  SNR  and 
can  be  expressed  as  [12] 


CTi.  = 


2  •  SNR,. 


+  Bf. 


i  =  r,  a,  v 


(17) 


where  At  and  Bx  are  pre-specified  parameters. 


In  addition  to  zero-mean  white  Gaussian  receiver  thermal  noise,  we  also  include  the  possible 
effect  of  colored  noise  caused  by  unavoidable  random  effects  present  in  the  radar's  operational  envi¬ 
ronment,  such  as  atmospheric  propagation  effects  or  random  platform  motion.  For  each  component 
of  the  measurement  vector  (r,  u,  and  u).  we  model  the  temporally  correlated  noise  induced  by  the 
environment  with  a  first-order  Gauss-Markov  process:  in  other  words,  each  component  of  the  mea¬ 
surement  vector  has  a  unique  pair  of  standard  deviation  and  correlation  time,  parametrizing  the 
zero-mean  colored  Gaussian  noise,  associated  with  it.  We  can  include  the  effect  of  a  "constant" 
random  bias  by  considering  a  first-order  Gauss  Markov  process  with  a  very  long  correlation  time. 

We  simulate  a  sequence  of  radar  measurements  by  adding  to  the  truth  data  a  term  accounting 
for  the  zero-mean  white  Gaussian  receiver  noise  and  a  term  accounting  for  the  environmentally- 
induced  zero-mean  colored  Gaussian  noise.  Given  the  sequence  of  simulated  radar  measurements, 
along  with  their  associated  measurement  error  variances,  we  form  a  track  using  a  textbook  extended 
Kalman  filter  [11].  We  use  the  procedure  outlined  in  Section  5.2.2  of  [1]  for  track  initialization. 
We  produce  results  for  both  the  simulation  a  “meta-simulation"  and  the  actual  observation.  To 
examine  the  effectiveness  of  the  proposed  simulation  validation  procedure,  we  consider  the  following 
scenarios: 


1.  Perfectly  matched  scenario:  In  the  case  of  a  perfectly  matched  scenario,  the  simulation 
and  the  actual  observation  results  are  drawn  from  the  same  ensemble.  In  other  words,  the  true 
hypothesis  is  the  null  hypothesis.  Ho.  corresponding  to  the  hypothesis  that  the  simulation  is 
consistent  with  actual  system  performance.  For  this  scenario,  wo  only  include  the  effect  of 
uncorrelated  receiver  thermal  noise  on  the  measurements,  while  excluding  the  effect  of  any 
environment  ally-induced  correlated  noise.  Results  of  the  simulation  validation  process  are 
shown  in  Figures  (>  and  7. 
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Figure  6  Time  series  of  the  macro  tracker  output  validation  metrics  for  the  perfectly  matched  scenario 
plotted  relative  to  the  observed  time  series  (shown  in  blue).  The  10  Monte  Carlo  realizations  are  shown 
in  gray ,  while  the  simulation  bounds  are  shown  in  red.  See  Table  1  for  a  definition  of  symbols.  The  total 
position  errvr.  “pe,  ”  and  the  square  root  of  the  trace  of  the  position  quadrant  of  the  error  covariance ,  “£p,  ” 
are  in  meters;  the  total  velocity  eiTor.  "ve.  ”  and  the  square  root  of  the  trace  of  the  velocity  quadrant  of  the 
error  covariance,  “ tv , "  are  in  meters  per  second;  and  the  NEES  and  NIS  are  dimensionless.  In  the  case  of 
a  perfectly  matched  scenario,  the  simulation  is  deemed  to  be  consistent  with  actual  system  performance . 
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Figure  7.  “ Scorecard ”  for  the  perfectly  matched  scenario.  The  gray  bars  indicate  the  rejection  indices,  p. 
for  the  26  validation  metrics  listed  in  Table  1.  The  lines  indicate  the  corresponding  normalized  rejection 
thresholds.  7,  for  o  —  0.01  (red),  o  =  0.05  (green),  and  a  —  0.1  (blue).  See  Table  1  for  a  definition  of 
symbols.  In  the  case  of  a  perfectly  matched  scenario,  the  simulation  is  deemed  to  be  consistent  with  actual 
sys t cm  pe r  fo rm an ce . 
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Plots  of  the  time  series  for  the  first  six  validation  metrics  listed  in  Table  1  are  shown  in 
Figure  6.  The  time  series  are  plotted  relative  to  the  time  series  of  the  observed  validation 
metric — shown  in  bine  in  Figure  6.  When  simulation  results  are  consistent  with  actual  sys¬ 
tem  performance,  we  expect  the  simulated  time  series  of  the  validation  metrics  to  cluster 
symmetrically  around  the  observed  time  series.  As  seen  in  Figure  6,  this  is  indeed  the  ease 
for  the  matched  scenario. 

The  scorecard  summarizing  the  consistency  of  the  simulation  results  with  the  observation  is 
shown  in  Figure  7.  The  gray  bars  indicate  the  rejection  indices,  p,  corresponding  to  the  26 
validation  metrics  listed  in  Table  1.  The  lines  indicate  the  normalized  rejection  thresholds. 
7,  corresponding  to  a  =  0.01  (red),  a  =  0.05  (green),  and  a  =0.1  (blue).  The  variation  of 
the  normalized  rejection  ratios  from  validation  metric  to  validation  metric  is  due  to  the  fact 
that  the  time  series  of  each  validation  metric  has  a  unique  correlation  timescale;  hence,  the 
number  of  independent  samples  is  not  necessarily  the  same  for  all  the  validation  metrics,  even 
though  the  total  number  of  samples  might  be  the  same.  For  a  =  0.01,  the  rejection  indices 
for  all  validation  metrics  remain  below  the  normalized  rejection  thresholds.  Therefore,  for 
a  =  0.01,  the  validation  agent  may  safely  declare  the  simulation  results  to  be  consistent  with 
the  observation.  For  larger  values  of  a.  the  validation  metrics  dxy  and  Prr  (“dx6r  and  usx4“ 
in  Figure  7,  respectively)  fall  above  the  normalized  rejection  thresholds,  albeit  not  too  far 
above.  Acceptance  or  rejection  of  the  simulation  based  on  these  two  metrics  will  depend  on 
the  validation  agent’s  common  sense  and  judgement.  For  example,  the  validation  agent  may 
have  reason  to  believe  that  numerical  errors  might  have  caused  the  rejection  indices  of  these 
validation  metrics  to  have  fallen  below  the  normalized  rejection  thresholds  corresponding  to 
the  larger  values  of  a  and  thus  pass  the  simulation.  For  this  scenario,  we  would  declare  the 
simulation  as  consistent  with  actual  system  performance,  despite  the  small  transgression  of 
the  drt  and  Prr  validation  metrics  for  large  values  of  the  model  maker’s  risk.  a. 

2.  Mismatched  target  scenario:  In  the  case  of  a  mismatched  target  scenario,  the  simulation 
and  the  observation  results  are  statistically  matched  except  for  the  target  model.  In  other 
words,  the  true  hypothesis  is  the  alternative  hypothesis,  H i,  corresponding  to  the  hypothesis 
that  the  simulation  is  inconsistent  with  the  actual  system  performance.  To  illustrate,  we 
consider  the  case  when  the  mean  value  of  the  observed  target  RCS  is  a  factor  of  2  (3  dB) 
larger  than  the  simulated  value.  Plots  of  the  SNR  and  RCS  versus  time  are  shown  in  Figure  8. 
As  seen  in  Figure  8  and  as  is  evident  from  Eq.  (16).  the  mean  value  of  the  observed  target  SNR 
is  also  a  factor  of  2  larger  than  the  simulated  value.  For  this  scenario,  we  also  include  only 
the  effect  of  uncorrelated  receiver  thermal  noise  on  the  measurements,  while  excluding  the 
effect  of  any  environmentally  induced  correlated  noise.  Results  of  the  simulation  validation 
process  are  shown  in  Figures  9  and  10. 

Plots  of  the  time  series  for  the  first  six  validation  metrics  listed  in  Table  1  are  shown  in 
Figure  9.  The  simulated  absolute  position  and  velocity  estimation  errors  (“pe  and  "ve.” 
respectively)  appear  to  be  consistent  with  the  observation,  while  the  simulated  square  roots 
of  the  traces  of  the  position  and  velocity  quadrants  of  the  error  covariance  matrix  (“tp"  and 
“tv,”  respectively)  are  inconsistent.  The  NEES  and  NTS  also  appear  to  be  consistent.  The 
inconsistency  in  the  error  covariance  is  attributable  to  the  mismatch  in  the  simulated  and 
observed  target  SNR.  Since  the  simulated  SNR  is  on  average  a  factor  of  2  smaller  than  the 
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Figure  8.  Mismatched  target  SNR  and  RCS.  The  10  Monte  Carlo  realizations  are  shown  in  gray,  while  th< 
observed  time  series  are  shown  in  blue. 
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Figure  9.  Time  series  of  the  macro  tracker  output  validation  metrics  for  the  mismatched  target  scenario 
plotted  relative  to  the  observed  time  series  (shown  in  blue).  The  10  Monte  Carlo  realizations  are  shown 
in  gray ,  while  the  simulation  bounds  are  shown  in  red.  See  Table  1  for  a  definition  of  symbols.  The  total 
position  error ,  ‘pe,”  and  the  square  root  of  the  trace  of  the  position  quadrant  of  the  error  covariance ,  “tp,” 
are  in  meters;  the  total  velocity  error ,  “ve, "  and  the  square  root  of  the  trace  of  the  velocity  quadrant  of  the 
error  covariance .  utv,n  are  in  meters  per  second;  and  the  NEES  and  NIS  are  dimensionless.  In  the  case  of 
a  mismatched  target  scenario ,  the  simulation  is  deemed  to  be  inconsistent  with  actual  system  performance. 
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Figure  10.  "Scorecard  for  the  mismatched  target  scenario.  The  gray  bars  indicate  the  rejection  indices, 
(),  for  the  26  validation  metrics  listed  in  Table  1.  The  lines  indicate  the  corresponding  normalized  rejection 
thresholds,  7,  for  o  =  0.01  (red),  a  =  0.05  (green),  and  a  =  0.1  (blue).  See  Table  1  for  a  definition  of 
symbols.  In  the  ease  of  a  mismatched  target  scenario,  the  simulation  is  deemed  to  be  inconsistent  until  actual 
system  perform anee. 
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observed  SNR.  the  simulated  values  of  the  error  covariance  are  more  “pessimistic.”  This 
explains  why  the  simulated  square  roots  of  the  traces  of  the  position  and  velocity  quadrants 
of  the  error  covariance  matrix  are  above  the  observation  values.  As  seen  from  Figure  9.  even 
though  the  simulated  covariance  matrix  is  inconsistent  with  the  observed  value,  the  covariance 
mismatch  was  not  sufficient  to  cause  an  inconsistency  in  the  NEES  and  NIS  -at  least  not  for 
this  case. 

The  scorecard  summarizing  the  consistency  of  the  simulation  results  with  the  observation  is 
shown  in  Figure  10.  Again,  we  notice  that  the  simulation  fails  for  all  the  validation  met¬ 
rics  that  are  dependent  on  SNR  specifically,  the  validation  metrics  corresponding  to  the 
measurement  error  and  the  state  estimation  error  covariances.  We  also  note  that  since  the 
modeling  and  simulation  remain  exactly  the  same  as  in  the  perfectly  matched  scenario,  the 
normalized  rejection  thresholds.  7,  which  are  computed  based  on  the  simulation  results,  also 
remain  the  same. 

3.  Mismatched  environment  scenario:  In  the  case  of  a  mismatched  environment  scenario, 
the  simulation  and  observation  results  are  statistically  matched  except  for  the  environmental 
impact.  In  other  words,  the  true  hypothesis  is  the  alternative  hypothesis,  H\.  To  illustrate, 
we  consider  the  case  when  the  observed  time  series  of  the  v  component  of  the  measurement 
vector  is  corrupted  by  colored  noise.  The  colored  noise  is  modeled  with  a  first-order  Gauss 
Markov  process  with  a  standard  deviation  of  1  msin  and  a  correlation  time  of  130  s.  Plots  of 
the  measurement  errors  for  this  scenario  are  shown  in  Figure  11.  Results  of  the  simulation 
validation  process  are  shown  in  Figures  12  and  13. 

Plots  of  the  time  series  for  the  first  six  validation  metrics  listed  in  Table  1  are  shown  in 
Figure  12.  As  seen  in  the  figure,  for  this  scenario,  the  simulated  total  position  and  velocity 
estimation  errors  (“pe”  and  “ve,”  respectively)  appear  to  be  inconsistent  with  the  observa¬ 
tion,  while  the  square  roots  of  the  traces  of  the  position  and  velocity  quadrants  of  the  error 
covariance  matrix  (utp"  and  “tv,”  respectively)  are  consistent.  The  inconsistency  in  the 
state  estimation  error  is  obviously  caused  by  the  bias  in  the  v  component  of  the  measure¬ 
ment  error,  which  is  not  accounted  for  by  the  simulation.  Since  both  the  measurement  and 
the  state  estimation  error  covariances  depend  mainly  on  SNR.  they  are  not  affected  by  the 
time- varying  bias  shown  in  Figure  11.  However,  for  more  severe  biases,  the  state  estimation 
error  covariance  can  also  be  significantly  impacted.  This  is  due  to  the  fact  that  for  nonlinear 
problems  (such  as  tracking  a  ballistic  target),  the  Jacobians  in  the  expressions  for  the  error 
covariance  in  the  prediction  and  update  steps  of  the  extended  Kalman  filter  depend  on  the 
target  state  estimate  [11],  which,  in  turn,  is  directly  impacted  by  measurement  biases  in  the 
update  step  of  the  Kalman  filter.  Since  the  simulated  state  estimation  errors  are  inconsistent 
with  the  observation,  NEES  is  also  seen  to  be  inconsistent  in  Figure  12. 

The  scorecard  summarizing  the  consistency  of  the  simulation  results  with  the  observation 
is  shown  in  Figure  13.  Again,  we  notice  that  the  simulation  fails  for  all  validation  metrics 
that  are  impacted  by  the  v  measurement  error  bias  -specifically,  the  validation  metrics  cor¬ 
responding  to  the  state  estimation  error.  Also,  we  again  note  that  since  the  modeling  and 
simulation  remain  exactly  the  same  as  in  the  previous  two  scenarios,  the  normalized  rejection 
thresholds,  7,  which  are  computed  based  on  the  simulation  results,  also  remain  the  same. 
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Figure  11.  Mismatched  v  measurement  error.  The  10  Monti  Carlo  realizations  arc  shown  m  gray,  while 
the  observed  time  series  are  shown  in  blue. 
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Figure  12.  Time  seri.es  of  the  macro  tracker  output  validation  metrics  for  the  mismatched  environment 
scenario  plotted  relative  to  the  observed  time  semes  (shown  in  blue).  The  10  Monte  Carlo  realizations  are 
shown  in  gray ,  while  the  simulation  bounds  are  shown  in  red.  See  Table  1  for  a  definition  of  symbols.  The 
total  position  error,  “pe,”  and  the  square  root  of  the  trace  of  the  position  quadrant  of  the  error  covariance , 
iktp arc  in  meters;  the  total  velocity  error ,  “ve,  ”  and  the  square  root  of  the  trace  of  the  velocity  quadrant 
of  the  error  covariance ,  “ tv are  in  meters  per  second;  and  the  NEES  and  NIS  arc  dimensionless.  In  the 
case  of  a  mismatched  environment  scenario ,  the  simulation  is  deemed  to  be  inconsistent  with  actual  system 
performance. 
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Figure  13.  “ Scorecard "  for  the  mismatched  environment  scenario.  Tin  gray  hors  indicate  the  rejection 

indices ,  p.  for  the  26  validation  metrics  listed  in  Table  1.  The  lines  indicate  the  eoirespo tiding  normalized 
rejection  thresholds.  7,  for  o  =  0.01  (red),  a  =  0.05  (green),  arid  a  =  0.1  (blue).  See  Table  1  for  a  definition 
of  symbols.  In  the  case  of  a  mismatched  environment  scenario,  the  simulation  is  deemed  to  be  inconsistent 
with  actual  system  performance. 
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6.  SUMMARY 


The  procedure  proposed  in  this  report  for  validating  the  modeling  and  simulation  of  a  generic 
tracking  radar  is  based  on  a  statistical  hypothesis  test.  The  two  hypotheses  are  (1)  the  hypothesis 
that  the  simulation  is  consistent  with  actual  system  performance  the  null  hypothesis.  Ho  and 
(2)  the  hypothesis  that  the  simulation  is  inconsistent  with  actual  system  performance  the  alter¬ 
native  hypothesis.  H\.  The  procedure  is  cognizant  of  the  model  maker's  risk.  a.  and  the  the  model 
user  s  risk.  3.  which  correspond  to  the  probabilities  of  Type  I  and  Type  11  errors,  respectively. 

The  proposed  acceptability  criteria  are  anchored  to  single  discrete-event  observations.  Since, 
in  general,  the  observed  behavior  is  not  repeatable,  the  probability  density  function  necessary  for 
the  computation  of  the  model  user's  risk.  /?,  is  not  accessible.  However,  it  is  always  possible  to 
derive  any  desirable  statistics  for  the  simulation  results  through  multiple  Monte  Carlo  realizations; 
thus,  it  is  always  possible  to  derive  appropriate  rejection  thresholds,  7.  based  on  pie-specified  values 
of  the  model  maker's  risk.  o.  The  model  maker’s  risk  is  used  as  an  adjustable  parameter  for  the 
validation  procedure  providing  different  rejection  thresholds.  Even  though  a  ‘‘receiver  operating 
characteristic"  or  “ROC"  curve,  providing  the  trade-off  between  the  model  maker’s  risk,  a,  and 
the  model  user’s  risk,  /I  cannot  be  computed  explicitly  (due  to  the  unavailability  of  the  probability 
density  function  of  the  observed  behavior),  it  is  nevertheless  understood  that  smaller  values  of  the 
model  maker’s  risk,  tv,  would  give  rise  to  larger  values  of  the  model  user’s  risk.  /}.  Thus,  a  family 
of  rejection  thresholds  corresponding  to  different  values  of  the  model  maker's  risk,  o,  ought  to  be 
considered  in  ail  effort  to  minimize  the  model  user’s  risk.  3. 

The  modeling  and  simulation  validation  procedure  proposed  in  this  report  is  performed  inde¬ 
pendently  on  a  set  of  validation  metrics.  A  list  of  26  validation  metrics  sufficient  for  validating  the 
modeling  and  simulation  of  a  phased-array  radar  tracking  a  ballistic  target  is  given  in  Table  1.  The' 
list  can  be  readily  expanded  to  account  for  models  of  tracking  nonballistie  targets  with  other  type's 
of  sensors.  Conversely,  for  many  applications,  not  all  of  the  validation  metrics  listed  in  Table  1 
need  be  considered.  For  example,  in  a  multiple  sensor  configuration,  where  only  the  total  position 
and  velocity  errors  in  shared  track  data  are  of  concern,  it  may  be  sufficient  to  examine  only  the 
first  six  items  listed  in  Table  1. 

Many  of  the  validation  metrics  listed  in  Table  1  are  not  statistically  independent.  For  ex¬ 
ample.  the  validation  metrics  corresponding  to  the  measurement  error  and  state  estimation  error 
covariances  are  dependent  on  the  signal-to-noise  ratio.  The  correlation  among  select  validation 
metrics  can  serve  as  a  diagnostic  tool  helping  to  identify  the  root  cause'  of  the  failure  of  a  given 
modeling  and  simulation  product.  The  numerical  experiment  conducted  in  Section  5  demonstrates 
the  utility  of  the  correlation  among  select  validation  metrics.  All  validation  metrics  listed  in  Table  1 
come  in  the  form  of  time  series:  hence,  any  temporal  correlation  present  in  the  time  series  must 
also  be  taken  into  account. 

The  steps  taken  in  the  proposed  validation  procedure  arc'  summarized  as  follows.  For  each 
validation  metric,  we  count  the  number  of  samples  of  the  observed  time  series  that  fall  outside  of 
bounds  prescribed  by  NMC  Monte  Carlo  realizations  of  the  simulated  time  series.  The  bounds  at 
each  time  index  correspond  to  the  minimum  and  maximum  values  of  the  Monte  Carlo  realizations. 


Subsequently,  if  the  number  of  observed  samples  that  are  outside  of  the  simulation  bounds  are 
above  a  pre-computed  rejection  threshold,  we  declare  the  simulated  time  series  of  the  particular 
validation  metric  under  scrutiny  as  inconsistent  with  the  observed  time  series.  For  each  validation 
metric,  the  rejection  threshold,  7,  is  computed  using  Eq.  (10).  The  rejection  threshold  depends  on 
(1)  the  model  maker’s  risk,  a.  (2)  the  number,  Ar;,  of  independent  samples  in  the  simulated  time 
series,  and  (3)  the  number,  jVmc .  of  Monte  Carlo  realizations  through  the  use  of  Eq.  (2).  The 
interrelationship  between  7.  iV7,  and  ArMC  for  a  given  a  is  explored  in  Figure  4. 

The  number,  Ar?.  of  independent  samples  can  be  obtained  using  any  of  the  techniques  discussed 
in  Section  3.  I11  order  to  model  the  outcome  of  the  aforementioned  counting  process  as  a  binomial 
random  variable,  the  samples  must  be  statistically  independent.  Statistical  independence  can  be 
ensured  by  devising  a  counting  procedure  that  repeatedly  divides  the  time  series  into  uncorrelated 
segments  and  then  picks  independent  samples  from  each  segment.  We  argued  that  the  statistical 
dependence  that  would  exist  in  counting  all  samples  can  be  accounted  for  in  the  computation  of 
the  rejection  threshold.  Based  on  a  Monte  Carlo  analysis,  we  showed  in  Section  3  that  rejection 
indices,  /?.  computed  based  on  counting  all  samples,  rarely  exceed  the  compensated  normalized 
rejection  thresholds,  7  We  have  thus  opted  for  the  simpler  approach  of  accounting  for  the  statistical 
dependence  of  the  time  series  in  the  computation  of  the  rejection  threshold. 

The  last  step  of  the  proposed  validation  procedure  consists  of  summarizing  the  results  of 
the  above  computations  for  each  of  the  validation  metrics  listed  in  Table  1  in  a  scorecard.  The 
scorecard  reveals  any  cross-correlation  that  exists  among  select  validation  metrics-  thus  serving 
as  a  diagnostic  tool.  For  each  discrete-event  observation,  the  scorecard  contains  a  list  of  rejection 
indices  for  the  different  validation  metrics,  with  each  rejection  index — expressed  as  a  number  be¬ 
tween  0  and  100  -  denoting  the  ratio  of  the  samples  of  the  observed  time  series  of  the  associated 
validation  metric  that  are  outside  of  the  simulation  bounds.  Normalized  rejection  thresholds  for  the 
different  validation  metrics-  also  expressed  as  numbers  between  0  and  100  are  also  included  in  the 
scorecard.  Furthermore,  we  require  a  family  of  normalized  rejection  thresholds,  corresponding  to 
different  values  of  the  model  maker’s  risk,  a,  be  included  in  the  scorecard.  Examples  of  scorecards 
are  presented  in  Section  5.  Specifically,  Figures  7,  10.  and  13  illustrate  useful  graphical  representa¬ 
tions  of  scorecards  obtained  for  the  different  scenarios  we  studied  in  Section  5.  Use  of  such  graphic  al 
displays  of  the  simulation  results  is  encouraged  as  they  readily  reveal  the  cross-correlation  among 
select  validation  metrics.  Thus,  using  sound  judgement  and  common  sense,  a  validation  agent  may 
use  such  scorecards  obtained  for  various  discrete-event  observations  to  accept  or  reject  a  given 
modeling  and  simulation  product.  More  importantly,  the  scorecards  also  serve  as  a  first  step  to¬ 
ward  identifying  problems  in  a  given  product  and  thus  pave  the  road  to  modeling  and  simulation 
improvement. 

The  modeling  and  simulation  acceptability  criteria  proposed  in  this  report  focused  mainly 
011  validating  the  modeling  and  simulation  of  the  tracking  capability  of  a  generic  radar.  However, 
these  criteria  can  be  generalized  to  include  other  radar  functions  -or  other  types  of  sensors,  such  as 
optical  or  IR  sensors.  Furthermore,  the  proposed  approach  can  be  readily  extended  to  validating  the 
modeling  and  simulation  of  the  targets  themselves.  Also,  the  proposed  approach  can  be  extended 
to  validating  the  modeling  and  simulation  of  the  operational  environment  of  a  given  sensor,  which 
directly  impacts  the  performance  of  all  sensor  functions.  All  such  problems  involve  the  modeling 
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and  simulation  of  time  series,  and  we  believe  the  techniques  proposed  in  this  report  are  well  suited 
to  the  validation  of  the  modeling  and  simulation  of  any  simulated  time  series  based  on  single 
discrete-event  observations. 
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